AITopics | bag-of-word data

Cross-Domain Matching for Bag-of-Words Data via Kernel Embeddings of Latent Distributions

Neural Information Processing SystemsJan-16-2025, 18:50:35 GMT

We propose a kernel-based method for finding matching between instances across different domains, such as multilingual documents and images with annotations. Each instance is assumed to be represented as a multiset of features, e.g., a bag-of-words representation for documents. The major difficulty in finding cross-domain relationships is that the similarity between instances in different domains cannot be directly measured. To overcome this difficulty, the proposed method embeds all the features of different domains in a shared latent space, and regards each instance as a distribution of its own features in the shared latent space. To represent the distributions efficiently and nonparametrically, we employ the framework of the kernel embeddings of distributions.

bag-of-word data, cross-domain matching, latent distribution, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.65)

Add feedback

Hierarchically Supervised Latent Dirichlet Allocation

Neural Information Processing SystemsApr-6-2023, 12:53:25 GMT

We introduce hierarchically supervised latent Dirichlet allocation (HSLDA), a model for hierarchically and multiply labeled bag-of-word data. Examples of such data include web pages and their placement in directories, product descriptions and associated categories from product hierarchies, and free-text clinical records and their assigned diagnosis codes. Out-of-sample label prediction is the primary goal of this work, but improved lower-dimensional representations of the bag-of-word data are also of interest. We demonstrate HSLDA on large-scale data from clinical document labeling and retail product categorization tasks. We show that leveraging the structure from hierarchical labels improves out-of-sample label prediction substantially when compared to models that do not.

bag-of-word data, hierarchically supervised latent dirichlet allocation, hslda

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

Hierarchically Supervised Latent Dirichlet Allocation

Perotte, Adler J., Wood, Frank, Elhadad, Noemie, Bartlett, Nicholas

Neural Information Processing SystemsFeb-15-2020, 00:11:23 GMT

We introduce hierarchically supervised latent Dirichlet allocation (HSLDA), a model for hierarchically and multiply labeled bag-of-word data. Examples of such data include web pages and their placement in directories, product descriptions and associated categories from product hierarchies, and free-text clinical records and their assigned diagnosis codes. Out-of-sample label prediction is the primary goal of this work, but improved lower-dimensional representations of the bag-of-word data are also of interest. We demonstrate HSLDA on large-scale data from clinical document labeling and retail product categorization tasks. We show that leveraging the structure from hierarchical labels improves out-of-sample label prediction substantially when compared to models that do not.

bag-of-word data, hierarchically supervised latent dirichlet allocation, hslda

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

Cross-Domain Matching for Bag-of-Words Data via Kernel Embeddings of Latent Distributions

Yoshikawa, Yuya, Iwata, Tomoharu, Sawada, Hiroshi, Yamada, Takeshi

Neural Information Processing SystemsFeb-14-2020, 08:59:00 GMT

We propose a kernel-based method for finding matching between instances across different domains, such as multilingual documents and images with annotations. Each instance is assumed to be represented as a multiset of features, e.g., a bag-of-words representation for documents. The major difficulty in finding cross-domain relationships is that the similarity between instances in different domains cannot be directly measured. To overcome this difficulty, the proposed method embeds all the features of different domains in a shared latent space, and regards each instance as a distribution of its own features in the shared latent space. To represent the distributions efficiently and nonparametrically, we employ the framework of the kernel embeddings of distributions.

artificial intelligence, natural language, text processing, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.65)

Add feedback

Filters

Collaborating Authors

bag-of-word data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Cross-Domain Matching for Bag-of-Words Data via Kernel Embeddings of Latent Distributions

Hierarchically Supervised Latent Dirichlet Allocation

Hierarchically Supervised Latent Dirichlet Allocation

Cross-Domain Matching for Bag-of-Words Data via Kernel Embeddings of Latent Distributions